Method and apparatus for promoting memory read commands

ABSTRACT

A device for providing data includes a data source, a bus interface, a data buffer, and control logic. The bus interface is coupled to a plurality of control lines of a bus and adapted to receive a read request targeting the data source. The control logic is adapted to determine if the read request requires multiple data phases to complete based on the control lines, and to retrieve at least two data phases of data from the data source and store them in the data buffer in response to the read request requiring multiple data phases to complete. A method for retrieving data includes receiving a read request on a bus. The bus includes a plurality of control lines. It is determined if the read request requires multiple data phases to complete based on the control lines. At least two data phases of data are retrieved from a data source in response to the read request requiring multiple data phases to complete. The at least two data phases of data are stored in a data buffer.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates generally to communication between deviceson different buses of a computer system, and, more particularly, to amethod and apparatus for promoting memory read commands andadvantageously prefetch data to reduce bus latency.

[0003] 2. Description of the Related Art

[0004] Computer systems of the PC type typically employ an expansion busto handle various data transfers and transactions related to I/O anddisk access. The expansion bus is separate from the system bus or fromthe bus to which the processor is connected, but is coupled to thesystem bus by a bridge circuit.

[0005] A variety of expansion bus architectures have been used in theart, including the ISA (Industry Standard Architecture) expansion bus,an 8-Mhz, 16-bit device and the EISA (Extension to ISA) bus, a 32-bitbus clocked at 8-Mhz. As performance requirements increased, with fasterprocessors and memory, and increased video bandwidth needs, highperformance bus standard were developed. These standards included theMicro Channel architecture, a 10-Mhz, 32-bit bus; an enhanced MicroChannel, using a 64-bit data width and 64-bit data streaming; and theVESA (Video Electronics Standards Association) bus, a 33 MHz, 32-bitlocal bus specifically adapted for a 486 processor.

[0006] More recently, the PCI (Peripheral Component Interconnect) busstandard was proposed by Intel Corporation as a longer-term expansionbus standard specifically addressing burst transfers. The original PCIbus standard has been revised several times, with the current standardbeing Revision 2.1, available from the PCI Special Interest Group,located in Portland, Oreg. The PCI Specification, Rev. 2.1, isincorporated herein by reference in its entirety. The PCI bus providesfor 32-bit or 64-bit transfers at 33 or 66 MHz. It can be populated withadapters requiring fast access to each other and/or with system memory,and that can be accessed by the host processor at speeds approachingthat of the processor's native bus speed. A 64-bit, 66-MHz PCI bus has atheoretical maximum transfer rate of 528 MByte/sec. All read and writetransfers over the bus may be burst transfers. The length of the burstmay be negotiated between initiator and target devices, and may be anylength.

[0007] A CPU operates at a much faster clock rate and data access ratethan most of the resources it accesses via a bus. In earlier processors,such as those commonly available when the ISA bus and EISA bus weredesigned, this delay in reading data from a resource on the bus washandled by inserting wait states. When a processor requested data thatwas not immediately available due to a slow memory or disk access, theprocessor merely marked time using wait states, doing no useful work,until the data finally became available. To make use of this delay time,a processor such as the Pentium Pro (P6), offered by Intel Corporation,provides a pipelined bus that allows multiple transactions to be pendingon the bus at one time, rather than requiring one transaction to befinished before starting another. Also, the P6 bus allows splittransactions, i.e., a request for data may be separated from thedelivery of the data by other transactions on the bus. The P6 processoruses a technique referred to as “deferred transaction” to accomplish thesplit on the bus. In a deferred transaction, a processor sends out aread request, for example, and the target sends back a “defer” response,meaning that the target will send the data onto the bus, on its owninitiative, when the data becomes available.

[0008] The PCI bus specification as set forth above does not provide forsplit transactions. There is no mechanism for issuing a “deferredtransaction” signal, nor for generating the deferred data initiative.Accordingly, while a P6 processor can communicate with resources such asmain memory that are on the processor bus itself using deferredtransactions, this technique is not used when communicating with diskdrives, network resources, compatibility devices, etc., on an expansionbus.

[0009] The PCI bus specification, however, provides a protocol forissuing delayed transactions. Delayed transactions use a retry protocolto implement efficient processing of the transactions. If an initiatorinitiates a request to a target and the target cannot provide the dataquickly enough, a retry command is issued. The retry command directs theinitiator to retry or “ask again” for the data at a later time. Indelayed transaction protocol, the target does not simply sit idly by,awaiting the renewed request. Instead, the target initially recordscertain information, such as the address and command type associatedwith the initiator's request, and begins to assemble the requestedinformation in anticipation of a retry request from the initiator. Whenthe request is retried, the information can be quickly provided withoutunnecessarily tying up the system's buses.

[0010] Differentiated commands are used in accordance with the PCIspecification to indicate, or at least hint at, the amount of datarequired by the initiator. A memory read (MR) command does not provideany immediate indication as to the length of the intended read. The readis terminated based on logic signals driven on the bus by the initiator.A memory read line (MRL) command, on the other hand, indicates that theinitiator intends to read at least one cache line (e.g., 32 bytes) ofdata. A memory read multiple command (MRM) indicates that the initiatoris likely to read more than one cache line of data. Based on the commandreceived, the bridge prefetches data and stores it in a buffer inanticipation of the retried transaction. The amount of data prefetcheddepends on the amount the initiator is likely to require. Efficiency ishighest when the amount of prefetched data most closely matches theamount of data required.

[0011] Prefetching in response to MRL and MRM commands is relativelyuncomplicated, because, by the very nature of the command, the bridgeknows to prefetch at least one, and likely more than one, cache line.The amount of data required by an initiator of an MR command, on theother hand, is not readily apparent. Initiators may issue MR commandseven if they know they will require multiple data phases. For example,the PCI specification recommends, but does not require, that initiatorsuse an MRL or an MRM command only if the starting address lies on acache line boundary. Accordingly, a device following this recommendationwould issue one or more MR commands until a cache line boundary isencountered, and would then issue the appropriate MRL or MRM command.Also, some devices, due to their vintage or their simplicity, are notequipped to issue MRL or MRM commands, and use MR commands exclusively.

[0012] To illustrate the difficulties of anticipating the amount of datarequired by the initiator of an MR command, FIGS. 1A through 1D providetiming diagrams of exemplary MR transactions on a PCI bus. For clarity,only those PCI control signals useful in illustrating the examples areshown. The PCI bus uses shared address/data (AD) lines and sharedcommand/byte enable (C/BE#) lines. In accordance with the PCIspecification, a turnaround cycle is required on all signals that may bedriven by more than one agent. In the case of the AD lines, theinitiator drives the address and the target drives the data. Theturnaround cycle is used to avoid contention when one agent stopsdriving a signal and another agent begins driving the signal. Aturnaround cycle is indicated on the timing diagrams as two arrowspointing at each others' tail.

[0013]FIG. 1A illustrates an MR command in which the initiator requiresmultiple data phases to complete the transaction. In this illustration,the target and initiator reside on the same PCI bus, and the target isready to supply the data when requested. The initiator asserts a FRAME#signal before the rising edge of a first clock cycle (CLK1) to indicatethat valid address and command bits are present on the AD lines and theC/BE# lines, respectively. During a third cycle, CLK3, the initiatorasserts the IRDY# signal to indicate that it is ready to receive data.The target also asserts the TRDY# signal at CLK3 (i.e., after theturnaround cycle) to signal that valid data is present on the AD lines.In accordance with the PCI specification, the initiator must deassertFRAME# before the last data phase. Because the FRAME# signal remainsasserted at CLK3, the target knows that more data is required. Datatransfer continues between the initiator and target during cycles CLK4and CLK5. The initiator deasserts the FRAME# signal before CLK5 toindicate that Data3 is the last data phase. The initiator continues toassert the IRDY# signal until after the last data phase has beencompleted.

[0014]FIG. 1B illustrates an MR command in which the initiator requiresonly one data phase to complete the transaction. Again, the initiatorasserts the FRAME# signal before the rising edge of the first clockcycle (CLK1) to indicate that valid address and command bits are presenton the AD lines and the C/BE# lines, respectively. During the thirdcycle, CLK3, the initiator asserts the IRDY# signal to indicate that itis ready to receive data. The target asserts the TRDY# signal at CLK3(i.e., after the turnaround cycle) to signal that valid data is presenton the AD lines. Because the initiator must deassert frame before thelast data phase, the FRAME# signal is deasserted before CLK3. The targetthen knows that no more data is required. The initiator continues toassert the IRDY# signal during the transfer of the data at CLK3, anddeasserts it thereafter.

[0015] From the examples of FIGS. 1A and 1B, it is clear that thedetermination of the amount of data required by the initiator may not bedetermined until well into the transaction. FIGS. 1A and 1B illustratedMR transaction between devices on the same PCI bus. FIGS. 1C and 1Dillustrates an MR transaction where the target resides on a differentPCI bus than the initiator, and is subordinate to a bridge device.

[0016] As shown in FIG. 1C, the initiator asserts the FRAME# signalbefore the rising edge of the first clock cycle (CLK1) to indicate thatvalid address and command bits are present on the AD lines and the C/BE#lines, respectively. The bridge claims the transaction, and because nodata is readily available forces a retry by asserting the STOP# signalduring CLK2. In response to the STOP# signal, the target deasserts theFRAME# signal before CLK3. The bridge then deasserts STOP# at CLK4. Thebridge, not knowing how much data the initiator requires, conservativelyassumes the transaction is a single data phase transaction and retrievesthe data.

[0017] At some later time, as shown in FIG. 1D, the initiator retriesthe request. Again, the initiator asserts the FRAME# signal before therising edge of the first clock cycle (CLK1) to indicate that validaddress and command bits are present on the AD lines and the C/BE#lines, respectively. The bridge, now in possession of the data, allowsthe transaction to proceed. During the third cycle, CLK3, the initiatorasserts the IRDY# signal to indicate that it is ready to receive data.The bridge asserts the TRDY# signal at CLK3 to signal that valid data ispresent on the AD lines. The bridge also asserts the STOP# signal atCLK3 to indicate it cannot provide any further data. Even though theinitiator desired more than one data phase to complete the transaction,as indicated by the FRAME# signal being asserted during the transfer ofData1, the transaction is terminated.

[0018] The initiator is then forced to issue a new transaction, inaccordance with FIG. 1C for the next data phase. The cycle of FIGS. 1Cand 1D repeats until the initiator has received its requested data. Thesituation of FIGS. 1C and 1D illustrate an inefficiency introduced bythe use of an MR command. It may take many such exchanges to completethe data transfer, thus increasing the number of tenancies (i.e.,exchanges between an initiator and a target) on the bus. Also, theinitiator, bridge, and target must compete for bus time with otherdevices on their respective buses, thus increasing the total number ofcycles required to complete the transaction beyond those required justto complete the evolutions of FIGS. 1C and 1D.

[0019] Techniques have been developed in the art to attempt to increasethe efficiency of MR transactions traversing bridges. One such techniqueinvolves storing an MR promotion bit for each of the devices subordinateto a bridge in the private configuration space of the bridge. If the bitis asserted, MR commands are automatically promoted, and multiple dataphases of data are prefetched. The decision on whether to set thepromotion bit depends on knowledge of the device being accessed. Certaindevices have undesirable read “side effects.” For example, an addressmight refer to a first-in-first-out (FIFO) register. A read to a FIFOincrements the pointer of the FIFO to the next slot. If the prefetchingconducted in response to the assertion of the promotion bit hits theaddress of the FIFO, it would increment, and a subsequent read targetingthe FIFO would retrieve the wrong data, possible causing undesirableoperation or a deadlock condition. Memory regions with such undesirableside effects are referred to as non-speculative regions, and memoryregions where prefetching is allowable is referred to as speculativememory regions.

[0020] The present invention is directed to overcoming, or at leastreducing the effects of, one or more of the problems set forth above.

SUMMARY OF THE INVENTION

[0021] One aspect of the present invention is seen in a device forproviding data. The device includes a data source, a bus interface, adata buffer, and control logic. The bus interface is coupled to aplurality of control lines of a bus and adapted to receive a readrequest targeting the data source. The control logic is adapted todetermine if the read request requires multiple data phases to completebased on the control lines, and to retrieve at least two data phases ofdata from the data source and store them in the data buffer in responseto the read request requiring multiple data phases to complete.

[0022] Another aspect of the present invention is seen in a method forretrieving data. The method includes receiving a read request on a bus.The bus includes a plurality of control lines. It is determined if theread request requires multiple data phases to complete based on thecontrol lines. At least two data phases of data are retrieved from adata source in response to the read request requiring multiple dataphases to complete. The at least two data phases of data are stored in adata buffer.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] The invention may be understood by reference to the followingdescription taken in conjunction with the accompanying drawings, inwhich like reference numerals identify like elements, and in which:

[0024]FIGS. 1A through 1D illustrate timing diagrams of typical priorart bus commands;

[0025]FIG. 2 is a simplified block diagram of a computer system inaccordance with the present invention;

[0026]FIG. 3A is a diagram illustrating typical lines included in aprocessor bus of FIG. 2;

[0027]FIG. 3B is a diagram illustrating typical lines included in aperipheral component interconnect bus of FIG. 2;

[0028]FIG. 4 is a simplified block diagram of a bridge device of FIG. 2;and

[0029]FIGS. 5 through 7 are timing diagrams of bus transactions inaccordance with the present invention.

[0030] While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof have been shown by wayof example in the drawings and are herein described in detail. It shouldbe understood, however, that the description herein of specificembodiments is not intended to limit the invention to the particularforms disclosed, but on the contrary, the intention is to cover allmodifications, equivalents, and alternatives falling within the spiritand scope of the invention as defined by the appended claims.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

[0031] Illustrative embodiments of the invention are described below. Inthe interest of clarity, not all features of an actual implementationare described in this specification. It will of course be appreciatedthat in the development of any such actual embodiment, numerousimplementation-specific decisions must be made to achieve thedevelopers' specific goals, such as compliance with system-related andbusiness-related constraints, which will vary from one implementation toanother. Moreover, it will be appreciated that such a development effortmight be complex and time-consuming, but would nevertheless be a routineundertaking for those of ordinary skill in the art having the benefit ofthis disclosure.

[0032] Referring to FIG. 2, a computer system 100 in accordance with thepresent invention is shown. The computer system 100 includes multipleprocessors 102 in the illustrated example, although more or less may beemployed. The processors 102 are connected to a processor bus 104. Theprocessor bus 104 operates based on the processor clock (not shown), soif the processors 102 are 166 MHz or 200 MHz devices (e.g., the clockspeed of a Pentium Pro processor), for example, then the processor bus104 is operated on some multiple of the base clock rate. A main memory106 is coupled to the processor bus 104 through a memory controller 108.In the illustrated embodiment, the processors 102 each have a level-twocache 110 as a separate chip within the same package as the CPU chipitself, and the CPU chips have level-one data and instruction caches(not shown) included on-chip.

[0033] Host bridges 112, 114 are provided between the processor bus 104and the PCI buses 116, 118, respectively. Two host bridges 112 and 114are shown, although it is understood that many computer systems 100would require only one, and other computer system 100 may use more thantwo. In one example, up to four of the host bridges 112, 114 may beused. The reason for using more than one host bridge 112, 114 is toincrease the potential data throughput. One of the host bridges 112 isdesignated as a primary bridge, and the remaining bridges 114 (if any)are designated as secondary bridges.

[0034] The primary host bridge 112, in the illustrated example, carriestraffic for “legacy” devices, such as an EISA bridge 120 coupled to anEISA bus 122, a keyboard/mouse controller 124, a video controller 126coupled to a monitor 128, a flash ROM 130, a NVRAM 132, and a controller134 for a floppy drive 136 and serial/parallel ports 138. The secondaryhost bridge 114 does not usually accommodate any PC legacy items.Coupled to the PCI bus 118 by the host bridge 114 to the processor bus104 are other resources such as a SCSI disk controller 140 for hard diskresources 142, 144, and a network adapter 146 for accessing a network148. A potentially large number of other stations (not shown) arecoupled to the network 148. Thus, transactions on the buses 104, 116,118 may originate in or be directed to another station (not shown) orserver (not shown) on the network 148.

[0035] The computer system 100 embodiment illustrated in FIG. 1 is thatof a server, rather than a standalone computer system, but the featuresdescribed herein may be used as well in a workstation or standalonedesktop computer. Some components, such as the controllers 124, 140, 146may be cards fitted into PCI bus slots (not shown) on the motherboard(not shown) of the computer system 100. If additional slots (not shown)are needed, a PCI-to-PCI bridge 150 may be placed on the PCI bus 118 toaccess another PCI bus 152. The additional PCI bus 152 does not provideadditional bandwidth, but allows more adapter cards to be added. Variousother server resources can be connected to the PCI buses 116, 118, 152using commercially-available controller cards, such as CD-ROM drives,tape drives, modems, connections to ISDN lines for internet access, etc.(all not shown).

[0036] Traffic between devices on the concurrent PCI buses 116, 118 andthe main memory 106 must traverse the processor bus 104. Peer-to-peertransactions are allowed between a master and target device on the samePCI bus 116, 118, and are referred to as “standard” peer-to-peertransactions. Transactions between a master on one PCI bus 116 and atarget device on another PCI bus 118 must traverse the processor bus104, and these are referred to as “traversing” transactions.

[0037] Referring briefly to FIG. 3A, the processor bus 104 contains anumber of standard signal or data lines as defined in the specificationfor the particular processor 102 being used. In addition, certainspecial signals are included for the unique operation of the bridges112, 114. In the illustrated embodiment, the processor bus 104 containsthirty-three address lines 300, sixty-four data lines 302, and a numberof control lines 304. Most of the control lines 304 are not required topromote understanding of the present invention, and, as such, are notdescribed in detail herein. Also, the address and data lines 300, 302have parity lines (not shown) associated with them that are also notdescribed.

[0038] Referring now to FIG. 3B, the PCI buses 116, 118, 152 alsocontain a number of standard signal and data lines as defined in the PCIspecification. The PCI buses 116, 118, 152 are of a multiplexedaddress/data type, and contain sixty-four AD lines 310, eightcommand/byte-enable lines 312, and a number of control lines (enumeratedbelow). The particular control lines used in the illustration of thepresent invention are a frame line 314 (FRAME#), an initiator ready line316 (IRDY#), a target ready line 318 (TRDY#), a stop line 320 (STOP#),and a clock line 322 (CLK).

[0039] Turning now to FIG. 4, a simplified block diagram showing thehost bridge 112 in greater detail is provided. The host bridge 114 is ofsimilar construction to that of the host bridge 112 depicted in FIG. 4.For simplicity, the host bridge 112 is hereinafter referred to as thebridge 112. The bridge 112 includes a processor bus interface circuit400 serving to acquire data and signals from the processor bus 104 andto drive the processor bus 104 with signals and data. A PCI businterface circuit 402 serves to drive the PCI bus 116 and to acquiresignals and data from the PCI bus 116. Internally, the bridge 112 isdivided into an upstream queue block 404 (US QBLK) and a downstreamqueue block 406 (DS QBLK). The term downstream refers to any transactiongoing from the processor bus 104 to the PCI bus 116, and the termupstream refers to any transaction going from the PCI bus 116 backtoward the processor bus 104. The bridge 112 interfaces on the upstreamside with the processor bus 104 which operates at a bus speed related tothe processor clock rate, which is, for example, 133 MHz, 166 MHz, or200 MHz for Pentium Pro processors 102. On the downstream side, thebridge 112 interfaces with the PCI bus 116 operating at 33 or 66 MHz.These bus frequencies are provided for illustrative purposes.Application of the invention is not limited by the particular bus speedsselected.

[0040] One function of the bridge 112 is to serve as a buffer betweenasynchronous buses 104, 116, and buses that differ in address/datapresentation, i.e., the processor bus 104 has separate address and datalines 300, 302, whereas the PCI bus 116 uses multiplexed address anddata lines 310. To accomplish these translations, all bus transactionsare buffered in FIFOs.

[0041] For transactions traversing the bridge 112, all memory writes areposted writes and all reads are split transactions. A memory writetransaction initiated by one of the processors 102 on the processor bus104 is posted to the processor bus interface circuit 400, and theprocessor 102 continues with instruction execution as if the write hadbeen completed. A read requested by one of the processors 102 is notimmediately implemented, due to mismatch in the speed of operation ofall of the data storage devices (except for caches) compared to theprocessor speed, so the reads are all treated as split transactions. Aninternal bus 408 conveys processor bus 104 write transactions or readdata from the processor bus interface circuit 400 to a downstreamdelayed completion queue (DSDCQ) 410 and its associated RAM 412, or to adownstream posted write queue (DSPWQ) 414 and its associated RAM 416.Read requests going downstream are stored in a downstream delayedrequest queue (DSDRQ) 418. An arbiter 420 monitors all pendingdownstream posted writes and read requests via valid bits on lines 422in the downstream queues 410, 414, 418 and schedules which one will beallowed to execute next on the PCI bus 116 according to the read andwrite ordering rules set forth in the PCI bus specification. The arbiter420 is coupled to the PCI bus interface circuit 402 for transferringcommands thereto.

[0042] The components of the upstream queue block 404 are similar tothose of the downstream queue block 406, i.e., the bridge 112 isessentially symmetrical for downstream and upstream transactions. Amemory write transaction initiated by a device on the PCI bus 116 isposted to the PCI bus interface circuit 402 and the master deviceproceeds as if the write had been completed. A read requested by adevice on the PCI bus 116 is not implemented at once by a target deviceon the processor bus 104, so these reads are again treated as delayedtransactions. An internal bus 424 conveys PCI bus write transactions orread data from the PCI bus interface circuit 402 to an upstream delayedcompletion queue (USDCQ) 426 and its associated RAM 428, or to anupstream posted write queue (USPWQ) 430 and its associated RAM 432. Readrequests going upstream are stored in an upstream delayed request queue(USDRQ) 434. An arbiter 436 monitors all pending upstream posted writesand read requests via valid bits on lines 438 in the upstream queues426, 430, 434 and schedules which one will be allowed to execute next onthe processor bus 104 according to the read and write ordering rules setforth in the PCI bus specification. The arbiter 436 is coupled to theprocessor bus interface circuit 400 for transferring commands thereto.

[0043] The structure and functions of the FIFO buffers or queues in thebridge 112 is now described. Each buffer in a delayed request queue 418,434 stores a delayed request that is waiting for execution, and thisdelayed request consists of a command field, an address field, a writedata field (not required if the request is a read request), and a validbit. The USDRQ 434 holds requests originating from masters on the PCIbus 116 and directed to targets on the processor bus 104 or the PCI bus118. In the illustrated embodiment, the USDRQ 434 and has eight buffers,corresponding one-to-one with eight buffers in the DSDCQ 410. The DSDRQ418 holds requests originating on the processor bus 104 and directed totargets on the PCI bus 116. In the illustrated embodiment, the DSDRQ 418and has four buffers, corresponding one-to-one with four buffers in theUSDCQ 426. The DSDRQ 418 is loaded with a request from the processor businterface circuit 400 and the USDCQ 426. Similarly, the USDRQ 434 isloaded from the PCI bus interface circuit 402 and the DSDCQ 410.Requests are routed through the DCQ 410, 426 logic to identify if a readrequest is a repeat of a previously encountered request. Thus, a readrequest from the processor bus 104 is latched into the processor businterface circuit 400 and the transaction information is applied to theUSDCQ 426, where it is compared with all enqueued prior downstream readrequests. If the current request is a duplicate, it is discarded if thedata is not yet available to satisfy the request. If it is not aduplicate, the information is forwarded to the DSDRQ 418. The samemechanism is used for upstream read requests. Information defining therequest is latched into the PCI bus interface circuit 402 from the PCIbus 116, forwarded to DSDCQ 410, and, if not a duplicate of an enqueuedrequest, forwarded to USDRQ 434.

[0044] The delayed completion queues 410, 426 and their associated dualport RAMs 412, 428 each store completion status and read data fordelayed requests. When a delayable request is sent from one of theinterfaces 400 or 402 to the queue block 404 or 406, the appropriate DCQ410, 426 is queried to see if a buffer for this same request has alreadybeen allocated. The address, commands, and byte enables are checkedagainst the buffers in DCQ 410 or 426. If no match is identified, a newbuffer is allocated (if available), and the request is delayed (ordeferred for the processor bus 104). The request is forwarded to the DRQ418 or 434 in the opposite side. The request is then executed on theopposite bus 104, 116, under control of the appropriate arbiter 420,436, and the completion status and data are forwarded back to theappropriate DCQ 410, 426. After status/data are placed in the allocatedbuffer in the DCQ 410, 426 in this manner, the buffer is not valid untilordering rules are satisfied. For example, a read cannot be completeduntil previous writes are completed. When a delayable request “matches”a DCQ 410, 426 buffer, and the requested data is valid, the requestcycle is ready for immediate completion.

[0045] The DSDCQ 410 stores status/read data for PCI-to-host delayedrequests, and the USDCQ 426 stores status/read data for Host-to-PCIdelayed or deferred requests. Each DSDCQ 410 buffer stores eight cachelines (256-bytes of data), and there are eight buffers (total datastorage=2 kB). The four buffers in the USDCQ 426, on the other hand,each store only 32 bytes (i.e., a cache line) of data (total datastorage=128-Bytes). The upstream and downstream operation is slightlydifferent in this regard.

[0046] The bridge 112 includes bridge control circuitry 440 thatprefetches data into the DSDCQ buffers 410 on behalf of the master,attempting to stream data with zero wait states after the delayedrequest completes. The DSDCQ 410 buffers are kept coherent with theprocessor bus 104 via snooping, which allows the buffers to be discardedas seldom as possible. Requests going the other direction may useprefetching, as described in greater detail below, however, since manyPCI memory regions have “read side effects” (e.g., stacks and FIFOs),the bridge control circuitry 440 attempts to prefetch data into thesebuffers on behalf of the master only under controlled circumstances. Inthe illustrated embodiment, the USDCQ 426 buffers are flushed as soon astheir associated deferred reply completes.

[0047] The posted write queues 414, 430 and their associated dual portRAM memories 416, 432 commands and data associated with transactions.Only memory writes are posted, i.e., writes to I/O space are not posted.Because memory writes flow through dedicated queues within the bridge,they cannot blocked by delayed requests that precede them, as requiredby the PCI specification. Each of the four buffers in DSPWQ 414 stores32 bytes (i.e., a cache line) of data plus commands for a host-to-PCIwrite. The four buffers in the DSPWQ 414 provide a total data storage of128 bytes. Each of the four buffers in USPWQ 430 stores 256 bytes ofdata plus commands for a PCI-to-host write, i.e., eight cache lines(total data storage=1 kB). Burst memory writes that are longer thaneight cache lines may cascade continuously from one buffer to the nextin the USPWQ 430. Often, an entire page (e.g., 4 kB) is written from thedisk 142 to the main memory 106 in a virtual memory system that isswitching between tasks. For this reason, the bridge 112 has morecapacity for bulk upstream memory writes than for downstream writes.

[0048] The arbiters 420 and 436 control event ordering in the QBLKs 404,406. These arbiters 420, 436 make certain that any transaction in theDRQ 418, 434 is not attempted until posted writes that preceded it areflushed, and that no datum in a DCQ 410, 426 is marked valid untilposted writes that arrived in the QBLK 404, 406 ahead of it are flushed.

[0049] As described above, there is a risk associated with prefetchingdata in response to an upstream read command due to potential sideeffects. However, the conservative approach of never prefetching forupstream reads, as illustrated in FIGS. 1A through 1D, results in costlyinefficiencies. The risk of prefetching is lessened if the anticipatedbehavior of the initiator can be predicted. For example, if an initiatorissues an MR command, and it can be identified that the initiator isrequesting more than one data phase of data, it is more likely thatprefetching data will not cause an unintended side effect.

[0050] The bridge control circuitry 440, as described in reference toFIGS. 5, 6, and 7, is adapted to detect if an initiator intends toretrieve multiple phases of data with a burst MR command. There arenumerous techniques for making such a determination, and several aredescribed herein for illustrative purposes. As described above, it oftentakes multiple clock cycles before the behavior of an initiator can bedetermined. The techniques described below, although using differentapproaches, attempt to identify the intentions of an initiator withrespect to the number of data phases desired and prefetch data, ifpossible, to reduce the inefficiencies described above. In response todetermining that the initiator intends to complete multiple data phases,the bridge control circuitry 440 prefetches multiple data phases of dataand stores them in the appropriate DCQ 410, 420 associated with thetransaction.

[0051] A first illustrative technique involves evaluating the behaviorof the initiator when the bridge issues a retry request (i.e., byasserting the STOP# signal). FIG. 5 illustrates a timing diagram of aread transaction traversing the bridge 112. The initiator asserts theFRAME# signal before the rising edge of the first clock cycle (CLK1) toindicate that valid address and command bits are present on the AD linesand the C/BE# lines, respectively. The bridge 112 claims thetransaction, and because no data is readily available forces a retry byasserting the STOP# signal during CLK3. When the STOP# signal isasserted, the bridge control circuitry 440 samples the FRAME# signal andthe IRDY# signal to determine the intentions of the initiator withrespect to the number of data phases requested. As described above inreference to FIG. 1B, an initiator requesting a single data phase mustdeassert the FRAME# signal before asserting the IRDY# signal to signifythat the last data phase is being requested. In FIG. 5, coincident withthe STOP# signal, the FRAME# signal and the IRDY# signal are bothasserted, indicating that the initiator intends to request multiple dataphases. Accordingly, the bridge control circuitry 440 prefetches morethan just a single data phase of data in anticipation of the impendingretry by the initiator. If the FRAME# signal was found to be deassertedwhen the STOP# signal was asserted, the bridge control circuitry 440retrieves only one data phase of data. Approaches for determining theamount of data to prefetch are discussed in greater detail below.

[0052] A second illustrative technique involves monitoring the behaviorof the initiation for a predetermined number of clock cycles after theFRAME# signal is asserted to identify if the initiator commits tomultiple data phases. In the illustrated embodiment, the predeterminednumber of clock cycles is three. FIG. 6 is a timing diagram illustratingthis technique. Again, the initiator asserts the FRAME# signal beforethe rising edge of the first clock cycle (CLK1) to indicate that validaddress and command bits are present on the AD lines and the C/BE#lines, respectively. The bridge 112 claims the transaction, and monitorsthe behavior of the initiator to determine if the initiator commits tomultiple data phases on or before the third clock cycle following theassertion of the FRAME# signal (i.e., CLK4). If the initiator does notcommit prior to the third clock cycle, the bridge control circuitry 440assumes a single data phase is required, and fetches only one data phaseof data.

[0053] The PCI specification does not impose a requirement on theinitiator to assert the IRDY# signal within a certain number of clockcycles after asserting the FRAME# signal. In FIG. 6, the initiator doesnot assert the IRDY# signal until after CLK4, and thus, at thedetermination point, the bridge control circuitry 440 determines thatthe initiator has not committed to a multiple phase transfer and assumesthat a single data phase is required. It is evident from the behavior ofthe initiator after CLK4 that the initiator intended to transfer duringmore than one data phase (i.e., the FRAME# signal and the IRDY# signalare both asserted at CLK5, but this intention is not detected by thebridge control circuitry 440. Instead, the bridge control circuitry 440asserts the STOP# signal at CLK5 in response to the lack of commitmenton the part of the initiator prior to CLK4.

[0054] If the initiator had responded in the manner previously describedin FIG. 5, the bridge control circuitry 440 would have detected theinitiators multiple phase intention at CLK2, and would have asserted theSTOP# signal at CLK3, without waiting the predetermined number of clockcycles.

[0055] A tradeoff exists between the number of cycles selected forevaluation and the accuracy of the determination of the initiator'sintention. A larger number of clock cycles yields more accurateprediction, but takes longer to complete.

[0056] A third illustrative technique involves simply sampling theFRAME# signal when the initiator asserts the IRDY# signal. If the FRAME#signal is asserted coincident with the IRDY# signal, as evident at CLK5of FIG. 7, the initiator has committed to a multiple data phasetransfer. Accordingly, the bridge control circuitry 440 asserts theSTOP# signal at CLK6, following the positive determination, and proceedsto prefetch multiple phases of data. This technique, although the mostaccurate, has the potential to introduce the most latency, as there isno restriction imposed by the PCI specification on the time between theassertion of the FRAME# signal and the subsequent assertion of the IRDY#signal.

[0057] The choice of how much data to prefetch in response todetermining that the initiator intends to complete multiple data phasesis application dependent. The bridge control circuitry 440 mightprefetch up to the next cache line boundary, the next 512 byte boundary,or the next 4 kB boundary. Alternatively, the amount of data mightdepend on the available space in the bridge 112.

[0058] To further safeguard against unintentionally prefetching a regionwith read side effects, a device in the computer system 100 knowinglyaccessing a non-speculative region should be restricted to using onlysingle data phase MR commands. In other words, multiple data phase readcommands should be reserved only for accessing known speculative memoryregions.

[0059] The bridge includes a configuration register 442 for selectivelyenabling or disabling the MR promotion function of the bridge controlcircuitry 440 for any or all of the PCI slots (not shown) subordinate tothe bridge 112. The configuration register 442 stores a plurality of MRpromotion bits, one for each subordinate device in its privateconfiguration space. During power-up, configuration software executingon the computer system 100 may choose to enable or disable the MRpromotion function for each of the slots. The configuration softwaredetermines the type of device installed, and may compare thisdetermination against a list of devices known to function well with MRpromotion, or alternatively, to a list of devices known to have problemswith MR promotion.

[0060] Although the preceding description focused on the application ofthe MR promotion techniques in a bridge 112, it is contemplated that thetechnique may be employed in any number of devices. For example, thehard disk resource 142, 144 may have a high latency as compared to theother devices accessing it. The hard disk resource 142, 144 mayimplement a buffering technique at least partially similar to that usedin the bridge 112, wherein a retry is forced while the data is buffered.The hard disk resource 142, 144 may advantageously use the MR promotiontechniques described herein to reduce latencies and/or tenancies on itsassociated bus 118. Such latency issues may be encountered when dealingwith devices resident on the network 148 and accessing data presentsomewhere on the computer system 100. Accordingly, the network adapter146 may advantageously implement MR promotion techniques. As such, MRpromotion may be used in peer-to-peer transaction, as well as traversingtransactions. Generally speaking, any device controlling data mayimplement MR promotion techniques in response to any received readtransaction for which data is not immediately available.

[0061] The particular embodiments disclosed above are illustrative only,as the invention may be modified and practiced in different butequivalent manners apparent to those skilled in the art having thebenefit of the teachings herein. Furthermore, no limitations areintended to the details of construction or design herein shown, otherthan as described in the claims below. It is therefore evident that theparticular embodiments disclosed above may be altered or modified andall such variations are considered within the scope and spirit of theinvention. Accordingly, the protection sought herein is as set forth inthe claims below.

What is claimed:
 1. A bridge device for communicating between a firstand a second bus, comprising: a bus interface coupled to a plurality ofcontrol lines of the first bus and adapted to receive a read requesttargeting the second bus; a data buffer; and control logic adapted todetermine if the read request requires multiple data phases to completebased on the control lines, and to retrieve at least two data phases ofdata from the second bus and store them in the data buffer in responseto the read request requiring multiple data phases to complete.
 2. Thebridge device of claim 1, wherein the control lines include a stop line,and the control logic is adapted to assert a stop signal on the stopline after determining that the read request requires multiple dataphases.
 3. The bridge device of claim 1, wherein the control linesinclude a frame line and a initiator ready line, and the control logicis adapted to sample a frame signal on the frame line and an initiatorready signal on the initiator ready line to determine of the readrequest requires multiple data phases.
 4. The bridge device of claim 3,wherein the control logic is adapted to determine that the read requestrequires multiple data phases in response to the frame signal and theinitiator ready signal being asserted concurrently.
 5. The bridge deviceof claim 4, wherein the control logic is adapted to determine that theread request requires multiple data phases in response to the framesignal and the initiator ready signal being asserted concurrently withina predetermined number of clock cycles.
 6. The bridge device of claim 5,wherein the control logic is adapted to retrieve only one phase of datain response to the frame signal and the initiator ready signal not beingasserted concurrently within the predetermined number of clock cycles.7. The bridge device of claim 4, wherein the control logic is adapted tosample the frame signal and the initiator ready signal in response tothe initiator ready signal being asserted.
 8. The bridge device of claim7, wherein the control logic is adapted to retrieve only one phase ofdata in response to the frame signal and the initiator ready signal notbeing asserted concurrently when the initiator ready signal is asserted.9. The bridge device of claim 3, wherein the control lines include astop line, and the control logic is adapted assert a stop signal on thestop line in response to data corresponding to the read request notbeing stored in the data buffer.
 10. The bridge device of claim 9,wherein the control logic is adapted to sample the frame signal and theinitiator signal when asserting the stop signal.
 11. The bridge deviceof claim 10, wherein the control logic is adapted to retrieve only onephase of data in response to the frame signal and the initiator readysignal not being asserted concurrently when the stop signal is asserted.12. The bridge device of claim 1, wherein the control logic is adaptedto retrieve a plurality of data phases of data in response to the readrequest requiring multiple data phases until a cache line boundary isreached.
 13. The bridge device of claim 1, wherein the control logic isadapted to retrieve a plurality of data phases of data in response tothe read request requiring multiple data phases until the data buffer isfull.
 14. The bridge device of claim 5, wherein the predetermined numberof cycles is between two and five.
 15. The bridge device of claim 5,wherein the predetermined number of cycles is at least two.
 16. A devicefor providing data, comprising: a data source; a bus interface coupledto a plurality of control lines of a bus and adapted to receive a readrequest targeting the data source; a data buffer; and control logicadapted to determine if the read request requires multiple data phasesto complete based on the control lines, and to retrieve at least twodata phases of data from the data source and store them in the databuffer in response to the read request requiring multiple data phases tocomplete.
 17. The device of claim 16, wherein the control lines includea stop line, and the control logic is adapted to assert a stop signal onthe stop line after determining that the read request requires multipledata phases.
 18. The device of claim 16, wherein the control linesinclude a frame line and a initiator ready line, and the control logicis adapted to sample a frame signal on the frame line and an initiatorready signal on the initiator ready line to determine of the readrequest requires multiple data phases.
 19. The device of claim 18,wherein the control logic is adapted to determine that the read requestrequires multiple data phases in response to the frame signal and theinitiator ready signal being asserted concurrently.
 20. The device ofclaim 19, wherein the control logic is adapted to determine that theread request requires multiple data phases in response to the framesignal and the initiator ready signal being asserted concurrently withina predetermined number of clock cycles.
 21. The device of claim 20,wherein the control logic is adapted to retrieve only one phase of datain response to the frame signal and the initiator ready signal not beingasserted concurrently within the predetermined number of clock cycles.22. The device of claim 19, wherein the control logic is adapted tosample the frame signal and the initiator ready signal in response tothe initiator ready signal being asserted.
 23. The device of claim 22,wherein the control logic is adapted to retrieve only one phase of datain response to the frame signal and the initiator ready signal not beingasserted concurrently when the initiator ready signal is asserted. 24.The device of claim 18, wherein the control lines include a stop line,and the control logic is adapted assert a stop signal on the stop linein response to data corresponding to the read request not being storedin the data buffer.
 25. The device of claim 24, wherein the controllogic is adapted to sample the frame signal and the initiator signalwhen asserting the stop signal.
 26. The device of claim 25, wherein thecontrol logic is adapted to retrieve only one phase of data in responseto the frame signal and the initiator ready signal not being assertedconcurrently when the stop signal is asserted.
 27. The device of claim16, wherein the control logic is adapted to retrieve a plurality of dataphases of data in response to the read request requiring multiple dataphases until a cache line boundary is reached.
 28. The device of claim16, wherein the control logic is adapted to retrieve a plurality of dataphases of data in response to the read request requiring multiple dataphases until the data buffer is full.
 29. The device of claim 16,wherein the data source comprises at least one of a second bus, a diskdrive, and a network.
 30. The device of claim 20, wherein thepredetermined number of cycles is between two and five.
 31. The deviceof claim 20, wherein the predetermined number of cycles is at least two.32. A method for retrieving data, comprising: receiving a read requeston a bus, the bus including a plurality of control lines; determiningthat the read request requires multiple data phases to complete based onthe control lines; retrieving at least two data phases of data from adata source in response to the read request requiring multiple dataphases to complete; and storing the at least two data phases of data ina data buffer.
 33. The method of claim 32, wherein the control linesinclude a stop line, and the method further includes asserting a stopsignal on the stop line after determining that the read request requiresmultiple data phases.
 34. The method of claim 32, wherein the controllines include a frame line and a initiator ready line, and determiningthat the read request requires multiple data phases includes: sampling aframe signal on the frame line; and sampling an initiator ready signalon the initiator ready line.
 35. The method of claim 34, whereindetermining that the read request requires multiple data phases includesdetermining that the frame signal and the initiator ready signal areasserted concurrently.
 36. The method of claim 35, wherein determiningthat the read request requires multiple data phases includes determiningthat the frame signal and the initiator ready signal are assertedconcurrently within a predetermined number of clock cycles.
 37. Themethod of claim 36, further comprising retrieving only one phase of datain response to the frame signal and the initiator ready signal not beingasserted concurrently within the predetermined number of clock cycles.38. The method of claim 35, wherein determining that the read requestrequires multiple data phases includes sampling the frame signal and theinitiator ready signal in response to the initiator ready signal beingasserted.
 39. The method of claim 38, further comprising retrieving onlyone phase of data in response to the frame signal and the initiatorready signal not being asserted concurrently when the initiator readysignal is asserted.
 40. The method of claim 34, wherein the controllines include a stop line, and the method further comprises asserting astop signal on the stop line in response to data corresponding to theread request not being stored in the data buffer.
 41. The method ofclaim 40, wherein determining that the read request requires multipledata phases includes sampling the frame signal and the initiator signalwhen asserting the stop signal.
 42. The method of claim 25, furthercomprising retrieving only one phase of data in response to the framesignal and the initiator ready signal not being asserted concurrentlywhen the stop signal is asserted.
 43. The method of claim 32, whereinretrieving the at least two data phases of data includes retrieving aplurality of data phases of data until a cache line boundary is reached.44. The method of claim 32, wherein retrieving the at least two dataphases of data includes retrieving a plurality of data phases of datauntil the data buffer is full.
 45. The method of claim 32, whereinretrieving the at least two data phases of data from the data sourceincludes retrieving the at least two data phases of data from at leastone of a second bus, a disk drive, and a network.
 46. The method ofclaim 36, wherein determining that the frame signal and the initiatorready signal are asserted concurrently within a predetermined number ofclock cycles includes determining that the frame signal and theinitiator ready signal are asserted concurrently within between two andfive clock cycles.
 47. The method of claim 36, wherein determining thatthe frame signal and the initiator ready signal are assertedconcurrently within a predetermined number of clock cycles includesdetermining that the frame signal and the initiator ready signal areasserted concurrently within at least two clock cycles.
 48. A computersystem, comprising: a first bus having a plurality of control lines; asecond bus; an initiating device coupled to the first bus and beingadapted to initiate a read request targeting the target device; a targetdevice coupled to the second bus; and a bridge device for communicatingbetween the first and second buses, comprising: a data buffer; andcontrol logic adapted to receive the read request, determine if the readrequest requires multiple data phases to complete based on the controllines, retrieve at least two data phases of data from the target device,and store the at least two data phases of data in the data buffer inresponse to the read request requiring multiple data phases to complete.49. The computer system of claim 48, wherein the control lines include astop line, and the control logic is adapted to assert a stop signal onthe stop line after determining that the read request requires multipledata phases.
 50. The computer system of claim 48, wherein the controllines include a frame line and a initiator ready line, the initiatingdevice is adapted to assert a frame signal on the frame line and aninitiator ready signal on the initiator ready line, and the controllogic is adapted to sample the frame signal and the initiator readysignal to determine of the read request requires multiple data phases.51. The computer system of claim 50, wherein the control logic isadapted to determine that the read request requires multiple data phasesin response to the frame signal and the initiator ready signal beingasserted concurrently.
 52. The computer system of claim 51, wherein thecontrol logic is adapted to determine that the read request requiresmultiple data phases in response to the frame signal and the initiatorready signal being asserted concurrently within a predetermined numberof clock cycles.
 53. The computer system of claim 52, wherein thecontrol logic is adapted to retrieve only one phase of data in responseto the frame signal and the initiator ready signal not being assertedconcurrently within the predetermined number of clock cycles.
 54. Thecomputer system of claim 51, wherein the control logic is adapted tosample the frame signal and the initiator ready signal in response tothe initiator ready signal being asserted.
 55. The computer system ofclaim 54, wherein the control logic is adapted to retrieve only onephase of data in response to the frame signal and the initiator readysignal not being asserted concurrently when the initiator ready signalis asserted.
 56. The computer system of claim 50, wherein the controllines include a stop line, and the control logic is adapted assert astop signal on the stop line in response to data corresponding to theread request not being stored in the data buffer.
 57. The computersystem of claim 56, wherein the control logic is adapted to sample theframe signal and the initiator signal when asserting the stop signal.58. The computer system of claim 57, wherein the control logic isadapted to retrieve only one phase of data in response to the framesignal and the initiator ready signal not being asserted concurrentlywhen the stop signal is asserted.
 59. The computer system of claim 48,wherein the control logic is adapted to retrieve a plurality of dataphases of data in response to the read request requiring multiple dataphases until a cache line boundary is reached.
 60. The computer systemof claim 48, wherein the control logic is adapted to retrieve aplurality of data phases of data in response to the read requestrequiring multiple data phases until the data buffer is full.
 61. Thecomputer system of claim 52, wherein the predetermined number of cyclesis between two and five.
 62. The computer system of claim 52, whereinthe predetermined number of cycles is at least two.
 63. An apparatus,comprising: means for receiving a read request on a bus, the busincluding a plurality of control lines; means for determining that theread request requires multiple data phases to complete based on thecontrol lines; means for retrieving at least two data phases of datafrom a data source in response to the read request requiring multipledata phases to complete; and means for storing the at least two dataphases of data.